On comparing clusterings: an element-centric framework unifies overlaps and hierarchy

نویسندگان

  • Alexander J. Gates
  • Ian B. Wood
  • William P. Hetrick
  • Yong-Yeol Ahn
چکیده

Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. For example, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Split-Merge Framework for Comparing Clusterings

Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing mea...

متن کامل

A Generalized Policy Support System and Its Hierarchy Semantics

One common characteristic of many Policy Support Systems (PSSs) is their dependency on the concept of hierarchy. Hierarchy does not need to be limited to a hierarchy of roles (subject centric) as in traditional Role-Based Access Control (RBAC). Instead, it can be applied to other aspects of PSS such as object, environment, purpose and so on. In this paper, we propose a new generalized model for...

متن کامل

Simultaneous Visualization of Clusterings

While there are a number of approaches for the visualization of hierarchical clusterings, as well as subsets in general, there exist only few attempts to visualize different clusterings simultaneously in the same drawing. In this thesis we lay the theoretical foundations for the simultaneous visualization of two or more clusterings. We establish a class-hierarchy that allows us to characterize ...

متن کامل

Comparing Clusterings by the Variation of Information

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering C to clustering C′. The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings....

متن کامل

Learning Multiple Hierarchical Relational Clusterings

Three important generalizations of the basic clustering problem are relational, hierarchical, and multiple clustering. This paper proposes the first approach to clustering that unifies all three. We describe a general probabilistic model for relational clustering, and show that flat, hierarchical and multiple relational clustering models are special cases. This paper also describes an efficient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.06136  شماره 

صفحات  -

تاریخ انتشار 2017